Text Mining Infrastructure in R
نویسندگان
چکیده
During the last decade text mining has become a widely used discipline utilizing statistical and machine learning methods. We present the tm package which provides a framework for text mining applications within R. We give a survey on text mining facilities in R and explain how typical application tasks can be carried out using our framework. We present techniques for count-based analysis methods, text clustering, text classification and string kernels.
منابع مشابه
Introduction to the tm Package Text Mining in R
This vignette gives a short introduction to text mining in R utilizing the text mining framework provided by the tm package. We present methods for data import, corpus handling, preprocessing, metadata management, and creation of term-document matrices. Our focus is on the main aspects of getting started with text mining in R—an in-depth description of the text mining infrastructure offered by ...
متن کاملKernel-based machine learning for fast text mining in R
Recent advances in the field of kernel-based machine learning methods allow fast processing of text using string kernels utilizing suffix arrays. kernlab provides both kernel methods’ infrastructure and a large collection of already implemented algorithms and includes an implementation of suffix-array-based string kernels. Along with the use of the text mining infrastructure provided by tm thes...
متن کاملBenchmarking infrastructure for mutation text mining
BACKGROUND Experimental research on the automatic extraction of information about mutations from texts is greatly hindered by the lack of consensus evaluation infrastructure for the testing and benchmarking of mutation text mining systems. RESULTS We propose a community-oriented annotation and benchmarking infrastructure to support development, testing, benchmarking, and comparison of mutatio...
متن کاملUsing Text Mining for Understanding Insulin Signalling
In this paper we describe our efforts and experience in using a mix of e-Science and text mining technologies in the context of large scale integrative biology studies. Using insulin signaling as an application framework, we describe the service-based text mining infrastructure used for the project and present a number of text mining workflows for performing a number of common tasks encountered...
متن کاملUsing Fuzzy LR Numbers in Bayesian Text Classifier for Classifying Persian Text Documents
Text Classification is an important research field in information retrieval and text mining. The main task in text classification is to assign text documents in predefined categories based on documents’ contents and labeled-training samples. Since word detection is a difficult and time consuming task in Persian language, Bayesian text classifier is an appropriate approach to deal with different...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008